Syntax Analysis using Amazon Comprehend Syntax API with AWS SDK for Python (Boto3)
Amazon Comprehend announced support of Syntax Analysis. In this blog, let's perform Syntax Analysis using Amazon Comprehend Syntax API with AWS SDK for Python (Boto3).
Amazon Comprehend Now Supports Syntax Analysis
Environment
$ pip list | grep boto3 boto3 1.9.2
Sample Code
Execution result
{ "ResponseMetadata": { "HTTPHeaders": { "connection": "keep-alive", "content-length": "2758", "content-type": "application/x-amz-json-1.1", "date": "Wed, 12 Sep 2018 16:35:03 GMT", "x-amzn-requestid": "cc8b7643-b6a9-11e8-9f8f-71568a3ae70c" }, "HTTPStatusCode": 200, "RequestId": "cc8b7643-b6a9-11e8-9f8f-71568a3ae70c", "RetryAttempts": 0 }, "SyntaxTokens": [ { "BeginOffset": 0, "EndOffset": 6, "PartOfSpeech": { "Score": 0.9970498085021973, "Tag": "PROPN" }, "Text": "Amazon", "TokenId": 1 }, { "BeginOffset": 7, "EndOffset": 17, "PartOfSpeech": { "Score": 0.9976467490196228, "Tag": "PROPN" }, "Text": "Comprehend", "TokenId": 2 }, { "BeginOffset": 18, "EndOffset": 20, "PartOfSpeech": { "Score": 0.9982584118843079, "Tag": "VERB" }, "Text": "is", "TokenId": 3 }, { "BeginOffset": 21, "EndOffset": 22, "PartOfSpeech": { "Score": 0.9999969005584717, "Tag": "DET" }, "Text": "a", "TokenId": 4 }, { "BeginOffset": 23, "EndOffset": 30, "PartOfSpeech": { "Score": 0.9993355870246887, "Tag": "ADJ" }, "Text": "natural", "TokenId": 5 }, { "BeginOffset": 31, "EndOffset": 39, "PartOfSpeech": { "Score": 0.996455729007721, "Tag": "NOUN" }, "Text": "language", "TokenId": 6 }, { "BeginOffset": 40, "EndOffset": 50, "PartOfSpeech": { "Score": 0.9889174699783325, "Tag": "NOUN" }, "Text": "processing", "TokenId": 7 }, { "BeginOffset": 51, "EndOffset": 52, "PartOfSpeech": { "Score": 0.9999988079071045, "Tag": "PUNCT" }, "Text": "(", "TokenId": 8 }, { "BeginOffset": 52, "EndOffset": 55, "PartOfSpeech": { "Score": 0.9151285290718079, "Tag": "PROPN" }, "Text": "NLP", "TokenId": 9 }, { "BeginOffset": 55, "EndOffset": 56, "PartOfSpeech": { "Score": 0.9999597072601318, "Tag": "PUNCT" }, "Text": ")", "TokenId": 10 }, { "BeginOffset": 57, "EndOffset": 64, "PartOfSpeech": { "Score": 0.9986529350280762, "Tag": "NOUN" }, "Text": "service", "TokenId": 11 }, { "BeginOffset": 65, "EndOffset": 69, "PartOfSpeech": { "Score": 0.9936331510543823, "Tag": "PRON" }, "Text": "that", "TokenId": 12 }, { "BeginOffset": 70, "EndOffset": 74, "PartOfSpeech": { "Score": 0.9999306201934814, "Tag": "VERB" }, "Text": "uses", "TokenId": 13 }, { "BeginOffset": 75, "EndOffset": 82, "PartOfSpeech": { "Score": 0.9979239702224731, "Tag": "NOUN" }, "Text": "machine", "TokenId": 14 }, { "BeginOffset": 83, "EndOffset": 91, "PartOfSpeech": { "Score": 0.7294206023216248, "Tag": "VERB" }, "Text": "learning", "TokenId": 15 }, { "BeginOffset": 92, "EndOffset": 94, "PartOfSpeech": { "Score": 0.9947968125343323, "Tag": "PART" }, "Text": "to", "TokenId": 16 }, { "BeginOffset": 95, "EndOffset": 99, "PartOfSpeech": { "Score": 0.9998737573623657, "Tag": "VERB" }, "Text": "find", "TokenId": 17 }, { "BeginOffset": 100, "EndOffset": 108, "PartOfSpeech": { "Score": 0.9998371601104736, "Tag": "NOUN" }, "Text": "insights", "TokenId": 18 }, { "BeginOffset": 109, "EndOffset": 112, "PartOfSpeech": { "Score": 0.9999772310256958, "Tag": "CONJ" }, "Text": "and", "TokenId": 19 }, { "BeginOffset": 113, "EndOffset": 126, "PartOfSpeech": { "Score": 0.9998776912689209, "Tag": "NOUN" }, "Text": "relationships", "TokenId": 20 }, { "BeginOffset": 127, "EndOffset": 129, "PartOfSpeech": { "Score": 0.9999299049377441, "Tag": "ADP" }, "Text": "in", "TokenId": 21 }, { "BeginOffset": 130, "EndOffset": 134, "PartOfSpeech": { "Score": 0.9992431402206421, "Tag": "NOUN" }, "Text": "text", "TokenId": 22 }, { "BeginOffset": 134, "EndOffset": 135, "PartOfSpeech": { "Score": 0.9999969005584717, "Tag": "PUNCT" }, "Text": ".", "TokenId": 23 } ] }
You can see that the text is tokenized and labeled a parts of speech, for instance, noun and verb. You can also confirm the confidence score.
The part of speech attached to the tag are summarized beßlow.
Token | Part of speech |
---|---|
ADJ | Adjective |
ADP | Adposition |
ADV | Adverb |
AUX | Auxiliary |
CONJ | Coordinating conjunction |
DET | Determiner |
INTJ | Interjection |
NOUN | Noun |
NUM | Numeral |
O | Other |
PART | Particle |
PRON | Pronoun |
PROPN | Proper noun |
PUNCT | Punctuation |
SCONJ | Subordinating conjunction |
SYM | Symbol |
VERB | Verb |
Please refer to this documentation for details.
Conclusion
Amazon Comprehend's Syntax Analysis can tokenize text and label each word with parts of speech and analyze it.
In this blog, we illustrated syntax analysis using Amazon Comprehend Syntax API with AWS SDK for Python (Boto3).
Please refer to the blog below about other features of Amazon Comprehend, Keyphrase Extraction
,Sentiment Analysis
,Entity Recognition
,Language Detection
, and Topic Modeling
.
How to use Amazon Comprehend operations using the AWS SDK for Python (Boto3)